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Abstract 

We propose several sampling architectures for the efficient acquisition of an ensemble of correlated 
signals. We show that without prior knowledge of the correlation structure, each of our architectures 
(under different sets of assumptions) can acquire the ensemble at a sub-Nyquist rate. Prior to sampling, 
the analog signals are diversified using simple, implementable components. The diversification is achieved 
by injecting types of “structured randomness” into the ensemble, the result of which is subsampled. 
For reconstruction, the ensemble is modeled as a low-rank matrix that we have observed through an 
(undetermined) set of linear equations. Our main results show that this matrix can be recovered using 
standard convex programming techniques when the total number of samples is on the order of the intrinsic 
degree of freedom of the ensemble — the more heavily correlated the ensemble, the fewer samples are 
needed. 

To motivate this study, we discuss how such ensembles arise in the context of array processing. 


1 Introduction 


This paper demonstrates how an ensemble of correlated signals can be efficiently sampled using three different 
implementable architectures. For each of these architectures we derive a sampling theorem that relates the 
bandwidth and the (a priori unknown) correlation structure to the total number of samples per second 
needed to fully capture the ensemble. 

We consider ensembles of signals output from M sensors, each of which is bandlimited to frequencies below 
W /2 (see Figure [l]). The entire ensemble can be acquired by taking W uniformly spaced samples per second 
in each channel, for a total sampling rate of MW. We will show that if the signals are correlated, meaning 
that the ensemble can be written as (or closely approximated by) distinct linear combinations of R <C M 
latent signals, then this net sampling rate can be reduced to the order of ~ RW using coded acquisition. 
The sampling architectures we propose are blind to the correlation structure of the signals; this structure is 
discovered as the signals are reconstructed. 


Each architecture involves a different type of analog diversification which ensures that the signals are suf¬ 
ficiently “spread out” so each point sample captures information about the ensemble. Ultimately, what is 
measured are not actual samples of the individual signals, but rather are different linear combinations that 
combine multiple signals and capture information over an interval of time. In Section 2.4.1 , we will show that 
these samples can be expressed as linear measurements of a low-rank matrix. Over the course of one second, 
we want to acquire an M x W matrix comprised of samples of the ensemble taken at the Nyquist rate. The 
proposed sampling architecture produces a series of linear combinations of entries of this matrix. Conditions 
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and a grant from the Packard Foundation. 
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under which a low-rank matrix can be effectively recovered from an underdetermined set of linear measure¬ 
ments have been the object of intense study in the recent literature jljjl]; the mathematical contributions in 
this paper show how these conditions are met by systems with clear implementation potential. 


Our motivation for studying these architectures comes from classical problems in array signal processing. In 
these applications, one or more “narrowband” signals are measured at multiple sensors at different spatial 
locations. While narrowband signals can have significant bandwidth, they are modulated up to a high carrier 
frequency, making them very heavily spatially correlated as they arrive at the array. This correlation, which 


we review in more detail in Section 1.2, can be systematically exploited for spatial filtering (beamforming), 
interference removal, direction-of-arrival estimation, and multiple source separation. These activities all 
depend on estimates of the inter-sensor correlation matrix, and the rank of this matrix can typically be 
related to the number of sources that are present. 

Compressive sampling has been used in array processing in the past: sparse regularization was used for 
direction of arrival estimation [5j|7] long before any of the “sub-Nyquist” sampling theorems started to make 
the theoretical guarantees concrete [&-10 


These results (along with more recent works including 11—-13] 


show how exploiting the structure of the array response in free space (for narrowband signals, this consists 
of samples of a superposition of a small number of sinusoids) can be used to either super-resolve the DOA 
estimate or reduce the number of array elements required to locate a certain number of sources. A single 
sample is associated with each sensor, and the acquisition complexity scales with the number of array 
elements. 


In this paper, we exploit this structure in a different way. Our goal is to completely reconstruct the time- 
varying signals at all the array elements. The structure we impose on this ensemble is more general than the 
spatial spectral sparsity in this previous work; we ask that the signals are correlated in some a priori unknown 
manner. Our ensemble sampling theorems remain applicable even when the array response depends on the 
position of the source in a complicated way. Moreover, our reconstruction algorithms are indifferent to what 
the spatial array response actually is, as long as the narrowband signals remain sufficiently correlated. 


The paper is organized as follows. In Sections o and |1.2| we describe the signal model and its motivation 


from problems in array processing. In Section 1.3, we introduce the components (and their corresponding 


mathematical models) that we will use in our sampling architectures. In Section [2] we present the sampling 
architectures, show how the measurements taken correspond to generalized measurements of a low-rank 
matrix, and state the relevant sampling theorems. Numerical simulations, illustrating our theoretical results, 
are presented in Section [3] Finally, Section [5| and Section [5] provide the derivation of the theoretical results. 


1.1 Signal model 

Our signal model is illustrated in Figure]!] We use X c (t) to denote the continuous-time signal ensemble we are 
trying to acquire, and x\ (t),... ,xmW to denote the individual signals within that ensemble. Conceptually, 
we may think of X c (t) as a “matrix” with a finite number M of rows, with each row containing a bandlimited 
signal. Our underlying assumption is that at every time £, the vector X c (t) G M m lies in a fixed subspace S 
of dimension R. We write 

X c (t) « AS c (t), (1) 

where S c (t) is a smaller signal ensemble with R signals and A is a M x R matrix with entries A[m, r] whose 
columns span <S. We will use the convention that fixed matrices operating to the left of the signal ensembles 
simply “mix” the signals point-by-point, and so ([!]) is equivalent to 

R 

(o«E A[m, r]s r (t). 

r= 1 


The only structure we will impose on the individual signals is that they are real-valued, bandlimited, and 
periodic. With this model, the signals live in a finite-dimensional linear subspace and there is a natural way 

2 


Draft by A. Ahmed and J. Romberg - January 28, 2015 - 1:24 









x c (t) 

. M^a/vwwvv^ 
V\M/vw\M/yv- 


M 


► i/\M/W\M/W\/v 
iA/wwwwWv 
V^wwvWVV 
vwwvvvwyw 


A 

S c (t) 


X 

A 

s 


AWWWVWVW 

R 





aaA/WWVWWV' , 


A[iJ] 


Mhi] 


M 

. 4. J. r, - j. a it i - 

w 



v n lf V-v v ■» <r n- u v ? — 





H l‘» - 1* ‘I- *r « * - » 1 



- 



- -1- n 1 V »- .1 -n u 1 V V * 1 




TV 


(a) (b) 


Figure 1: (a) Our model is that an ensemble of continuous-time signals are correlated, meaning the M signals can 
be closely approximated by a linear combination of R underlying signals. We can write the M signals in X c (t) as 
a tall matrix (capturing the correlation structure) multiplied by an ensemble of R latent signals, (b) The matrix of 
samples inherits the low-rank structure of the continuous-time ensemble. 


to discretize the problem; that is, what exists in X c (t) for t G [0,1] is all there is to know, and each signal 
can be captured exactly with W equally-spaced samples, which, for the most part, reduces the clutter in 
mathematics. Each bandlimited, periodic signal in the ensemble can be written as 

B 

]T a m [k\e 2 * kt , (2) 

k= — B 

where the a m [k] are complex but are symmetric, a m [—k] = a m [k]*, to ensure that x m (t) is real. We can 
capture x m (t) perfectly by taking W = 2B-\-l equally spaced samples per row. We will call this the M xW 
matrix of samples X ; knowing every entry in this matrix is the same as knowing the entire signal ensemble. 
We can write 

X = CF n , (3) 

where F is a W xW normalized discrete Fourier matrix with entries 


F[k, n] = _L e -i27rfcn/iy 0<n<W-l, - B < k < B, 

vw 

and C is a M x W matrix whose rows contain Fourier series coefficients for the signals in X c (t). F is 
orthonormal, while C inherits the correlation structure of the ensemble X c (t). 


Same correlated signal model was considered in 114 for compressive sampling of multiplexed signals. Two 


multiplexing architectures were proposed and for each a sampling theorem was proved that dictated minimum 
number of samples for exact recovery of the signal ensemble. This paper presents sampling architectures, 
where we use a separate ADC for each channel and rigorously prove that ADCs can operate at roughly 
the optimal sampling rate to guarantee signal recovery. Other types of correlated signal models have been 


exploited previously to achieve gains in the sampling rate. For example, 15 shows that two signals related by 


a sparse convolution kernel can be reconstructed jointly at a reduced sampling rate. The signal model in 16 


considers multiple signals all of which live in a fixed subspace spanned by a subset of the basis functions of 
a known basis, and provides some abstract results that show that the sampling rate required to successfully 
recover the signals scales with the number of basis functions used in the construction of the signals. In this 
paper, we also show that the sampling rate scales with the number of independent latent signals but we do 
this without the knowledge of the basis. For a more applied treatment of the results with similar flavor as 
in 16 , we refer the reader to fl7 -19 . 


As will be shown later, we observe the signal ensemble X c (t) through a limited set of random projections, 
and recover the latent signals, and the subspace from these few random projections by solving a nuclear 
norm minimization program. A related work 20 considers the case when given a few random projections 


of a signal, we find out the subspace to which it belongs by solving a series of least-squares programs. 


We end this section by noting that their are many ways this problem might be discretized. Using Fourier series 
is convenient in two ways: we can easily tie together the notion of a signal being bandlimited with having 
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a limited support in Fourier space, and our sampling operators have representations in Fourier space that 
make them more straightforward to analyze. In practice, however, we might choose to represent the signal 
over a finite interval using any one of a number of basis expansions — the low rank structure is preserved 
under any linear representation. It is also possible that we are interested in performing the ensemble recovery 
over multiple time frames, and would like the recovery to transition smoothly between these frames. For this 
we might consider a windowed Fourier series representations (e.g. the lapped orthogonal transform in |2l]) 
that are carefully designed so that the basis functions are tapered sinusoids (so we again get something close 
to bandlimited signals by truncating the representation to a certain depth) but remain orthonormal. It is 
also possible to adjust our recovery techniques to allow for measurements which span consecutive frames, 
yielding another natural way to tie the reconstructions together. 


1.2 Applications in array signal processing 

One application area where low-rank ensembles of signals play a central role is array processing of narrowband 
signals. In this section, we briefly review how these low-rank ensembles arise. The central idea is that 
sampling a wavefront at multiple locations in space (as well as in time) leads to redundancies which can be 
exploited for spatial processing. These concepts are very general, and are common to applications as diverse 
as surveillance radars, underwater acoustic source localization and imaging, seismic exploration, wireless 
communications. 

The essential scenario is that multiple signals are emitted from different locations, each of the signals occupies 
the same bandwidth of size W which has been modulated up to a carrier frequency f c . The signals observed 
by receivers in the array are, to a rough approximation, complex multiples of one another. To a very 
close approximation, the observed signals he in a subspace with dimension close to one — this subspace is 
determined by the location of the source. This redundancy between the observations at the array elements 
is precisely what causes the ensemble of signals to be low rank; the rank of the ensemble is determined by 
the number of emitters. The only conceptual departure from the discussion in previous sections, as we will 
see below, is that each emitter may be responsible for a subspace spanned by a number of latent “signals” 
that is greater than one (but still small). 

Having an array with a large number of appropriately spaced elements can be very advantageous even when 
there only a relatively small number of emitters present. Observing multiple delayed versions of a signal 
allows us to perform spatial processing, we can beamform to enhance or null out emitters at certain angles, 
and separate signals coming from different emitters. The resolution to which we can perform this spatial 
processing depends on the number of elements in the array (and their spacing). 

The main results of this paper do not give any guarantees about how well these spatial processing tasks can 
be performed. Rather, they say that the same correlation structure that makes these tasks possible can be 
used to lower the net sampling rate over time. The entire signal ensemble can be reconstructed from this 
reduced set of samples, and spatial processing can follow. 

We now discuss in more detail how these low rank ensembles come about. For simplicity, this discussion 
will center on linear arrays in free space. As we just need the signal ensemble to lie in a low dimensional 
subspace, and do not need to know what this subspace may be beforehand, the essential aspects of the model 
extend to general array geometries channel responses and frequency-selective/multipath channels. 

Suppose that a signal is incident on the array (as a plane wave) at an angle 0. Each array element observes 
a different shift of this signal — if we denote what is seen at the array center (the origin in Figure [2^a)) 
by s(£), then an element m at distance d m from the center sees x m (t) = s(t — (d m /c) sin#). If the signal 
consists of a single complex sinusoid, s(t ) = e j27r ^ t , then these delays translated into different (complex) 
linear multiples of the same signal, 


Xrn (t ) = e ~j 27T f r m sin(6)/c e j27Tft^ 


(4) 
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In this case, the signal ensemble has ranlj^] 1; we can write X(t) = a($, /)e :/27r ^ t , where a($, /) is an M- 
dimensional steering vector of complex weights given above. 


This decomposition of the signal ensemble makes it clear how spatial information is coded into the array 
observations. For instance, standard techniques 22|23 for estimating the direction of arrival involve forming 
the spatial correlation matrix by averaging in time, 


R 


XX 


1 N 

-]Tx(* n )x(* n ) H . 


As the column space of R xx should be a($, /), we can correlate the steering vector for every direction to see 
which one comes closest to matching the principal eigenvector of R xx . 

The ensemble remains low rank when the emitter has a small amount of bandwidth relative to a larger carrier 
frequency. If we take s(t) = Sb(t) e j27r ^ ct , where Sb(t) is bandlimited to IF/2, then when IF <C / c , the a(0, /) 
for / G [f c — W/2,f c + IF/2] will be very closely correlated with one another. In the standard scenario 
where the array elements are uniformly spaced c/(2/ c ) along a line, we can make this statement more 
precise using classical results on spectral concentration 24][25]. In this case, the steering vectors a(0, /) 
for / £ [f c ± IF/2] are equivalent to integer spaced samples of a signal whose (continuous-time) Fourier 
transform is bandlimited to frequencies in (1 =b W/(2/ c ))(sin0)/2, for a bandwidth less than W/(2f c ). Thus 
the dimension of the subspace spanned by { a(9 , /), / £ [/ c =bIF/2]} is, to within a very good approximation, 
~ MW/f c + 1. 

Figure [2|b) illustrates a particular example. The plot shows the (normalized) eigenvalues of the matrix 

rfc+W/ 2 

Raa= a(e,f)a(e,f) H df, ( 5 ) 

Jfc-W/2 

for the fixed values of f c = 5 GHz, W = 100 MHz, c equals the speed of light, M = 101, and 9 = 7r/4. We 
have MW/ f c + 1 = 3.02, and only 3 of the eigenvalues are within a factor of 10 4 of the largest one. 

It is fair, then, to say that the rank of the signal ensemble is a small constant times the number of narrow 
band emitters. 


1.3 Architectural components 


In addition to analog-to-digital converters, our proposed architectures will use three standard components: 
analog vector-matrix multipliers, modulators, and linear time-invariant filters. The signal ensemble is passed 
through these devices, and the result is sampled using an analog-to-digital converter (ADC) taking either 
uniformly or non-uniformly spaced samples — these samples are the final outputs of our acquisition archi¬ 
tectures. 


The analog vector-matrix multiplier (AVMM) produces an output signal ensemble AX c (t ) when we input 
it with signal ensemble X c (t), where A is an N x M matrix whose elements are fixed. Since the matrix 
operates pointwise on the ensemble of signals, sampling output AX c (t) is the same as applying A to matrix 
X of the samples (i.e., sampling commutes with the application of A). Recently, AVMM blocks have been 
built with hundreds of inputs and outputs and with bandwidths in the tens-to-hundreds of megahertz [26|, 27j 


We will use the AVMM block to ensure that energy disperses more or less evenly throughout the channels. If 
A is a random orthogonal transform, it is highly probable that each signal in AX c (t ) will contain about the 
same amount of energy regardless of how the energy is distributed among the signals in X c (t) (formalized 
in Lemma [l] below), allowing us to deploy equal sampling resources in each channel while ensuring that 
resources on channels that are “quiet” are not being wasted. 


1 We are using complex numbers here to make the discussion go smoothly; the real part of the signal ensemble is rank 2, 
having a cos(27r ft) and a sin(27r/t) term. 
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Figure 2: (a) A plane wave impinges on a linear array in free space. When the wave is a pure tone in time, then the 
responses at each element will simply be phase shifts of one another, (b) Eigenvalues for R aa , on a log 10 scale and 
normalized so that the largest eigenvalue is 1, defined in 0 for an electromagnetic signal width a bandwidth of 100 
MHz and a carrier frequency of 5 GHz; the array elements are spaced half a carrier-wavelength apart. Even when the 
signal has an appreciable bandwidth, the signals at each of the array elements are heavily correlated — the effective 
dimension in this case is R = 3 or 4. 
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Figure 3: (a) The analog vector-matrix multiplier (AVMM) takes random linear combinations of M input signals to 
produce N output signals. The action of AVMM can be thought of as the left multiplication of random matrix A to 
ensemble X c (t). Intuitively, this operation amounts to distributing energy in the ensemble equally across channels, 
(b) Modulators multiply a signal in analog with a random binary waveform that disperses energy in the Fourier 
transform of the signal, (c) Random LTI filters randomize the phase information in the Fourier transform of a given 
signal by convolving it with h c {t) in analog, which distributes energy in time, (d) Finally, ADCs convert an analog 
stream of information in discrete form. We use both uniform and non-uniform sampling devices in our architectures. 
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The second component of the proposed architecture is the modulators, which simply take a single signal 
x(t) and multiply it by fixed and known signal d c (t). We will take d c (t ) to be a binary ±1 waveform that 
is constant over time intervals of a certain length 1/IF. That is, the waveform alternates at the Nyquist 
sampling rate. If we take W samples of d c (t)x(t) on [0,1], then we can write the vector of samples y as 

y = Dx, (6) 

where x is the IF-vector containing the Nyquist-rate samples of x(t) on [0,1], and D is an W x W diagonal 
matrix whose entries are samples d G of d c (t). We will choose a binary sequence that randomly generates 
d c (t), which amounts to D being a random matrix of the following form: 


D = 


4 1] 


where d[n\ = ±1 with probability 1/2, 


d[W - 1] 


(7) 


and the d[n\ are independent. Conceptually, the modulator disperses the information in the entire band of 
x(t) — this allows us to acquire the information at a smaller rate by filtering a sub-band as will be shown 
in Section [2] 


Compressive sampling architectures based on the random modulator have been analyzed previously in the 
literature [l8j|28]. The principal finding is that if the input signal is spectrally sparse (meaning the total size 
of the support of its Fourier transform is a small percentage of the entire band), then the modulator can be 
followed by a low-pass filter and an ADC that takes samples at a rate comparable to the size of the active 
band. This architecture has been implemented in hardware in multiple applications 17 29-32 . 


The third type of component we will use to preprocess the signal ensemble is a linear time-invariant (LTI) 
filter that takes an input x(t) and convolves it with a fixed and known impulse response h c (t). We will 
assume that we have complete control over h c (t ), even though this brushes aside admittedly important 
implementation questions. Because x(t) is periodic and bandlimited, we can write the action of the LTI 
filter as a IT x IT circular matrix H operating on samples x (the first row of H consists of samples h of 
h c (t)), that is, y = Hx, where y is the IF Nyquist samples of the signal obtained at the output of the filter. 
We will make repeated use of the fact that H is diagonalized by the discrete Fourier transform: 


H = F r H F, 


( 8 ) 


where F is the W x W normalized discrete Fourier matrix with entries, and H is a diagonal matrix whose 
entries are h = VWFh. The vector h is a scaled version of the non-zero Fourier series coefficients of h c (t). 


To generate the impulse response, we will use a random unit-magnitude sequence in the Fourier domain . In 
particular, we will take 


p(0) 


H = 


k i) 


(9) 


h(W-l)_ 


where 


h(uS) 



±1, with prob. 1/2, 

e J<9 “, with 0 U ~ Uniform([0, 27r]), 


h(W-u>)*, 


uj = 0 

1 <cj < (IF - l)/2 
(IF + l)/2 <cj < W-l 


( 10 ) 


These symmetry constraints are imposed so that h (and hence, h c {t)) is real-valued. Conceptually, convolu¬ 
tion with h c (t ) disperses a signal over time while maintaining fixed energy (note that H is an orthonormal 
matrix). 


Convolution with a random pulse followed by sub-sampling has also been analyzed in the compressed sensing 
literature 33 - 36 . If the random filter is created in the Fourier domain as above, then following the filter with 
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an ADC that samples at random locations produces a universally efficient compressive sampling architecture 
— the number of samples that we need to recover a signal with only S active terms at unknown locations 
in any fixed basis scales linearly in S and logarithmically in ambient-dimension W. 


2 Main Results: Sampling Architectures 


This section presents the main results and their implications for the problem of efficient sampling of the 
ensemble of correlated signals. We start with a straightforward architecture in Section |2T] that minimizes 
the sample rate when the correlation structure is known. We then combine our components from the last 
section in different ways to create architectures that are provably effective under different assumptions on 
the signal ensemble. The first architecture in Section [23| consists simply of a bank of non-uniform ADCs; 
this architecture is provably effective when the ensemble is spread approximately evenly across channels and 
in time. The second architecture in Section 2.4 uses a random pre-modulator in front of a uniform ADC; 
this architecture is again effective when the energy in the ensemble is approximately uniform in both time 
and across the array. In Section |2.5| we present two architectures, consists of a AVMM, LTI filters, and 


either non-uniform ADCs or modulators coupled with uniform ADCs, that are universal in that they work 
for any type of correlation structure. 


2.1 Fixed projections for known correlation structure 

If the mixing matrix A for ensemble X c (t) is known, then a straightforward way exists to sample the 
ensemble efficiently. Let A = UT,V t be the singular value decomposition of A, where U is M x R matrix 
with orthogonal columns, 5] is Rx R diagonal matrix, and V is W x R with orthogonal columns. An efficient 
way is to whiten ensemble A with U T and sample the resulting R signals (each at rate W). This scheme is 
shown in Figure [4j X can be written as a multiplication of matrix U and Rx W matrix Y, which contains 
the Nyquist samples of signals X \(£),... ,##(£) respectively in its R rows. The signal ensemble can the be 
obtained using the multiplication 

X = UY. 

Therefore, if we know the correlation structure E7, then X and hence X c (t) (using sine interpolation of 
samples in X) can be recovered from the optimal total sampling rate of RW samples per second. 
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Figure 4: If we know the correlation structure then efficient sampling structure is to whiten with U T and then 
sample, which requires R ADCs, each operating at a rate W samples per second. Hence, ADCs take a total of RW 
samples per second, RW being the degrees of freedom in the R signals bandlimited to W/2. 

In many interesting applications, the correlation structure of the ensemble X c {t) is not known at the time 
of acquisition. In this paper, we design sampling strategies that are blind to the correlation structure but 
still achieve a near optimal sampling rate by introducing AVMMs, filters, and modulators. The randomness 
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introduced by these components disperses energy over time and across channels so that the ADCs are always 
sensing information; this injection of structured randomness allows an efficient use of sampling resources. 


2.2 Matrix recovery 


Given the generalized samples of the ensemble, we recover (the discretized) Xq using a standard convex pro¬ 
gram. We denote the linear operator that maps the ensemble to the samples as A(- • •), and our measurements 
as 

y = A(X 0 ), yeR L , X 0 €R MxW . 

To recover Xq from y , we solve we solve 

nun \\ X L ( n ) 

subject to y = A(X) 

where \\X\\* is the nuclear norm (the sum of the singular values of X). Minimizing the nuclear norm 
encourages the solution to be low rank jl , and has concrete performance guarantees when A obeys certain 
properties 2 . When the measurements are noisy 

y = A(X o)+£, with U\\ 2 <S (12) 

we instead solve the following quadratically constrained convex optimization program 


\\ X L ( 13 ) 

subject to ||y — A(X )|| 2 < S. 


Again, this relaxed program has certain performance guarantees [37 38 when Xq is indeed low rank. 


Our results will relate the total number of samples L taken by each architecture to the rank (and other 
properties) of the signal ensemble. The goal is to come as close to RW total samples as possible, the number 
necessary even when the correlation structure is known. 


2.3 Architecture 1: Random sampling of time-dispersed correlated signals 


The architecture presented in this section, shown in Figure [5j consists of one non-uniform sampling (NUS) 
ADC per channel. Each ADC takes samples at randomly selected locations, and these locations are chosen 
independently from channel to channel. Over the time interval t E [0,1), a NUS ADC takes input signal 
x m (t) and returns the samples {x m (ti c ) : tk E T m C {0,1 /W ,..., 1 — 1 /W}. The average sampling rate in 
each channel is |T m | = U. The net action of all M NUS ADCs is to return MU random samples of the input 
signal ensemble on uniform grid. 


The sampling model is equivalent to observing L = MU randomly chosen entries of the matrix of samples 
X, defined in ©• This problem is exactly the same as the matrix-completion problem [3], which states 
that given a small number of entries of a low-rank matrix, we can fill in the missing entries under some 
incoherence assumptions on the matrix X. Since X is rank-i?, its svd is 

X = U'ZV T , (14) 

where U G 5] E and V E M VFxi? . The coherence is then defined as 


Mo = max 1 


M 


_ max 
R 1 <i<M 


I U T e, 


W 

~R 


max 

l<i<W 


\V T ei 


MW 


uv 


TII2 


(15) 


Now the matrix-completion results 39 in the noiseless case assert that if 


MCI > Cy 2 0 R(W + M) log 2 (W), 
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Figure 5: M signals recorded by the sensors are sampled separately by the independent random sampling ADCs, 
each of which samples on a uniform grid at an average rate of Q samples per second. This sampling scheme takes on 
the average a total of MQ samples per second and is equivalent to observing MO entries of the matrix X at random 


then the solution of the nuclear norm minimization program © (with A : R MxW R Mn such that A 
maps X to randomly chosen entries of X) exactly equals X with high probability. The result indicates that 
the sampling rate scales (within some log factors) with the number R of independent signals rather than 
with the total number M of signals in the ensemble. When the measurements y are contaminated with noise 


as in (12) then the the result in 37 suggest that the solution X to the optimization problem (13) satisfies 


X — X|| F < C Mo \/min(M, W)S, 


where is a constant that depends on the coherence /iq, defined in (15). 


As discussed before, the number of samples for the matrix completion increase with /ig. The coherence 
parameter is small for matrices with even distribution of energy among their entries; see, [3] for details. 
Furthermore, signals are also known to be bandlimited, which implies that Architecture 1 is more effective 


for the efficient sampling of signals dispersed across channels and time. We will show in Section [275] that 
using AVMM and filters at the front end of the sampling scheme force the signal energy to be distributed 
evenly. This will allow us to build sampling architectures that are effective uniformly for all ensembles of 
signals lying in a subspace. 


2.4 Architecture 2: The random modulator for correlated signals 


To efficiently acquire the signal ensemble lying in a subspace, the architecture 2, shown in Figure [6j follows 
a two-step approach. In the first step, each of the M signals undergo analog preprocessing, which involves 
modulation, and low-pass filtering. The modulator takes an input signal x m (t) and multiplies it by a fixed 
and known d m (t). We will take d m (t) to be a binary ±1 waveform that is constant over an interval of length 
1/W. Intuitively, the modulation results in the diversification of the signal information over the frequency 
band of width W. The diversified analog signals are then processed by an analog-low-pass filter; implemented 
using an integrator, see 28 for details. The low-pass filter only selects a frequency sub-band (or a subspace) 


of width roughly equal to Q, and as will be shown in Theorem [l] this partial information is enough for the 
signal reconstruction. The partial information suffices as the signals are scrambled using modulators before 
low-pass filtering. Note that the low-pass filter in each channel in Fig. [6] can be replaced; in general, by a 
band-pass filter, i.e., the location of the band does not matter only its width does. This also explains why 
we don’t need to know the subspace in which signals live in advance. 

In the second step, the filtered signal is sampled by an ADC in each channel at a lower rate D. The result 
in Theorem [l] asserts that D is roughly of a factor of R/M smaller than the Nyquist rate W. 
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Figure 6: The random demodulator for multiple signals lying in a subspace: M signals lying in a subspace are 
preprocessed in analog using a bank of independent modulators, and low-pass filters. The resultant signal is then 
sampled uniformly by an ADC in each channel operating at rate Q samples per second. The net sampling rate is 
L = DM samples per second. 


2.4.1 System in matrix form 


Each of the M input signals x m (£), 1 < m < M is multiplied by an independently generated random binary 
waveform d m (t ), 1 < m < M alternating at rate W. That is, the output after the modulation in the mth 
channel is 

Vm{t) = x m (t) • d m (t ), 1 < m < M, and t G [0,1). 

The y m (t ) are then low-pass filtered using an integrator, which integrates y m (t) over an interval of width 
1/D and the result is then sampled at rate D using an ADC. The samples taken by the ADC in the mth 
channel are 

rn/Pt 

y m [n]= / y m (t)dt, 1 <n<n. 

J (n —1)/0 

The integration operation commutes with the modulation process; hence, we can equivalently integrate the 
signals x m (t), 1 < m < M over the interval of width l/W, and treat them as samples Xq G M MxVF of the 
ensemble X c (t). The entries Xq [m, n] of the matrix Xq are 


X 0 [m,n\ 


rn/w 

/ x m (t)dt , 

J (n—1 )/W 
B 

E c \ m 

uj=—B 


l2ttuj 


—i2'KUjn/W 

^ 5 


where the bracketed term representing the low-pass filter 

^l2ttu)/W ^ 

is evaluated in the window uj G {—B,. where W = 2B + 1, as defined in <©• We will use an equivalent 

evaluation L[uj] of L[w\ in the window uj G {1,..., 2 B + 1}, i.e., we will use L[uj\ = L[uj], for uj = 1,..., B, 
and L[uj\ = L[uj — W + 1], for uj = B + 1,..., 2B + 1. The Fourier coefficients of C[m, uj] of X defined in (|3| 
are related to the Fourier coefficients Co[m,uj\ of Xq 



Co[m,uj\ — C[m,uj\L[uj], 1 < uj < W, 


( 16 ) 


and in time domain 

X 0 = C 0 LF h , 


( 17 ) 
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where L is a W x W diagonal matrix containing L[uj\ , 1 < cj < W as its diagonal entries, F is the W x W 
DFT matrix, and C o is the coefficients matrix with entries defined in (16). Since C o inherits its low-rank 
structure from C; therefore, Xq is also a low-rank matrix of rank R. In the rest of this write up, we will 
consider recovering the rank R matrix Xq. Since L is well-conditioned, the recovery of Xq implies the 
recovery of X in (J3|. 


In light of the W equally-spaced samples of d m (t)x m (t) are D m £C m , where x m contains the W uniformly- 
spaced samples of x m (t), and D m , as in ([7]), is a random diagonal matrix containing random binary signs 
d m [n\ along the diagonal. The binary waveform for the modulator in each channel is independently generated, 
which amounts to {An} being independent. 


The samples y m G in t G [0,1) taken by the ADC in the rath branch are 


Urn = PDmXm , 1 < 771 < M, 

where x m G are the rows of Xq defined in ( p~7| ) ; D m is the independent instantiation of W x W random 
diagonal matrix defined in 0, and corresponds to the modulator in the mth branch; and P : x W is the 
matrix for the integrator (used as low-pass filter; for more details, see [28]) that contains ones in locations 
(a,/?) G (j, Bj), for 1 < j < D, where 

Bj = {(,j ~ 1 )W/Q + 1 : jW/Q}, l<j<Sl, 


where we are assuming for simplicity that is a factor of W. Since the action of the integrator commutes 
with the action of the modulator, the operation of the integrator can be simply represented as a block- 
diagonal matrix P operating on the modulated entries of the rows of Xo, which contains the samples of the 
integrated signals. Putting it all together, the samples acquired by the ADCs can be written as a random 
block-diagonal matrix times the vector vec(Xo), formed by stacking the rows of low-rank Xq as 


~ Vi~ 


PD\ 

y m _ 


PD m . 


• vec(Xo), 


(18) 


where y G R" M is the vector containing the samples acquired by all the ADCs. We will denote by L, the 
total number of samples per second MVt taken by all the ADCs. 


2.4.2 Sampling theorem: Exact and stable recovery 

Clearly, the samples y at the ADCs are a linear transformation A of the rank -R matrix Xq 


y = A(X 0 ). 


Let Xo = UT,V t be the reduced form svd of Ao with U : M x R, V : W x R being the matrices of left 
and right singular vectors, respectively, and E : Rx R being a diagonal matrix containing singular values of 
Xq. Let {, and {ek}i<k<w be the standard basis vectors of dimensions M, and W, respectively. 
The coherences of Xq is defined as 


= L max II t/ T e^ |||, 

n R i<i<M * 112 ’ 

(19) 

y o = T max || "V AT e 7 - 1|| , 

P2 Ri<j<w" J " 2 ’ 

(20) 


(21) 

1 <j<n 
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A simple calculation shows that 1 < /if < M/R , and 1 < /if — W/R; see, [ 3 ] for details. We will only show 
here that 1 < /i§ < MW/R. Begin with 


V (LW T ,e i eJ) 2 < V max ||C/ T e i ||| • max || V T e k \ 

z —' z —' * /c 

R 2 


k~B 


k~Bj 


1 MW’ 

where the first inequality follows from the Cauchy-Schwartz’s inequality, and the last equality follows from 
the definitions in (19), and (20). The fact that fi 2 < MW/R follows by using the upper bounds on /if, and 
/if. Similarly, the lower bound is obtained by summing over j on both sides of the definition as follows 


= {UV\e,elf = ^||E7 T e 


MCI, 


3 = 1 


R 


j =1 k~Bj 


R 


i 1125 


which immediately gives /if > 1 after using the fact using the fact that /if > 1. All three coherence quantities 
take smallest values for equally dispersed singular vectors and largest values for sparse singular vectors [3]. 
In our context, this implies that the coherence parameters quantify the dispersion of the signal-ensemble 
energy across time and channels. 


Theorem 1. Suppose L = MQ measurements of the ensemble Xq are taken using (18). If 


Q > C^/ifi?max(/ifW/M,/if)log 3 (WM) 


( 22 ) 


for some f3 > 1, then the minimizer X to the problem ( jTT| ) is unique and equal to Xq with probability at 
least 1 - (WM)-P. 


The result indicates that each ADC operates at a rate Q that is smaller than the Nyquist rate IT by a 
factor of R/M. The net sampling rate L scales with the number R of independent signals rather than with 
the total number M of signals in the ensemble. Thus, the random demodulator provably acquires multiple 
signals lying in a subspace at a rate that is within log factors of the optimal sampling rate without knowing 
the subspace in advance. The coherence terms suggest that the sampling architecture is more effective for 
sampling signals with energy dispersed evenly across channels and time. 


2.4.3 Stable recovery 


In a realistic scenario, the measurements are almost always contaminated with noise, as in (12). In the case, 


when the noise is bounded, i.e., ||£||2 < S, then following the template of the proof in 37 , it can be shown 
that under the conditions of Theorem^] the solution X of ( fbij ) obeys 


\\X - X 0 ||f < C^/mm(W, M)5 , (23) 

with high probability; for more details on this, see a similar stability result in Theorem 2 in 140 . The 
above stability result is weak due to the multiplication factor ^/min(IT, M). In contrast to the optimization 


program in (13), the solution X to a slightly different optimization program: 


X = argmin x {||X||2 - 2 (y, A{X)) + A ||X||J, 


(24) 


proposed in 41 can be theoretically shown to obey essentially optimal stable recovery results. By completing 


the square, it is easy to see that the above estimator is equivalent to 


X = argmin x {||X - A*(y)\\\ 
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Taking the sub-differential dC(X) of the cost function C(X) = j[|X — A*(y)W%-\- A||XLI and using the fact 

that the estimate X is a soft thresholding 


41 


X is the minimizer if and only if 0 E dC(X ), it can be shown 
of the singular values of the matrix Xj± — A*(y) E M MxW 

X = - \}+Ui{X A )v*{X A ), 

where = max{x,0}; in addition, Ui(Xj\), and Vi(X jf) are the left and right singular vectors of the 
matrix X respectively; and cq(X^) is the corresponding singular value. 


In comparison to the estimator (24), the matrix Lasso in does not use the knowledge of the known 
distribution of A and instead minimizes the empirical risk || y — 4(X)||2 = \\y\\\ — 2(y,A(X)) + ||^4.(J^T) Hi- 
Knowing the distribution, and the fact that EA*A = 1 holds in our case, we replace ||4(X)||2, by its 
expected value E ||4(X)||2 = ||X||| i n the empirical risk to obtain the estimator in ( [24] ) . Although the KLT 
estimator is easier to analyze, and gives optimal stable recovery results in theory but it does not empirically 
perform as well as matrix Lasso. 

Before stating the stable recovery results, we introduce the statistical assumptions on the additive measure¬ 
ment noise £, which are given as 


max E exp 


& 


( 7 * 


< C 


||C||^ 2 =c^E4 = cL ( 7 2 , 


(25) 

(26) 




where fa denotes the Orlicz-2 norm of vector £ E M L that contains ^ as its entries. The choice of the 
indexing with double-index i,j will be clear in Section |5j With this the following result is in order. 

Theorem 2. Let Xo E M MxVF be a rank R matrix, and suppose that we observe yij as in (12) contaminated 
with noise such that (25) holds. Then with probability at least 1 — ( WM)~& for some /3 > 1, the solution 


X to (24) will obey 


\\x-x 0 \\ f <cuu 2 , 

for a fixed constant C, when Ll > C/dyl max(kE/M, 1) log 2 (WM) 


(27) 


Roughly speaking, the stable recovery theorem states that the nuclear norm penalized estimators are stable 


are 


in the presence of additive measurement noise. The results in Theorem [2] are derived assuming that 
random with statistics in (25). In contrast, the stable recovery results in the compressed sensing literature 


only assume that the noise is bounded, i.e., ||£||2 < S, where £ is the noise vector introduced earlier. Here, 
we give a brief comparison of the results in Theorem Q with the stable recovery results in 


37 42 


Compare the result in (23) with (27), it follows that our results improve upon the results in 37 by a factor 


of 1/(W A M). We will also compare our stable recovery results against the stable recovery results derived 
in 142]. The result roughly states if the linear operator A satisfies the matrix RIP Of and ||€|| 2 <«5, then 
the solution X to (13) obeys 

||X - X 0 || F < CS. (28) 

The above result is essentially optimal stable recovery result. In comparison to (28), the result in 


is 


also optimal, however, we prove it for a different estimator and under a statistical bound on the noise term 
11^11^2 — In addition, we also donot require the matrix RIP for A , which is generally required to prove 


optimal results of the form of (28). 


The result in Theorem [l] is more effective for incoherent X. Roughly speaking, the incoherence conditions are 
satisfied by a matrix with even distribution of energy among its entries. The incoherence conditions on the 
matrix of samples Xo when combined with the fact that the signals are also known to be bandlimited implies 
that Architectures 2 is also feasible for the efficient sampling of spread out correlated signal ensembles. In 
contrast, a universal sampling scheme will work for any ensemble of correlated signals. We can design such 
a sampling scheme by preprocessing signals in analog by components that transform (with high probability) 
matrix X to an incoherent matrix. We present such an architecture in the following section. 
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2.5 Architecture 3: Uniform sampling architectures 


The performance of Architecture 1 and 2 depends on the coherences in defined (15); and (19), (20), (21) of 
ensemble X c (t). That is, the net sampling rate depends on the energy distribution of the signal ensemble. In 
this section, we present uniform sampling architectures that sample any given ensemble of correlated signals 
with no prior requirements on the energy distribution. We will present uniform versions of Architecture 1 
and 2 shown in Figure [7| and [8] In both uniform-sampling schemes, we force the coherences to be small by 
adding a little analog preprocessing using AVMM and filters at the front end of the Architecture 1 and 2. 


/nAA/vW^ 

X 0 : 

A/WVvWljWVW 



Figure 7: Analog vector-matrix multiplier (AVMM) takes random linear combinations of M input signals to produce 
M output signals. This equalizes energy across channels. The random LTI filters convolve the signals with a diverse 
waveform that results in dispersion of signals across time. The resultant signals are then sampled, at locations selected 
randomly on a uniform grid, at an average rate Q, using a non-uniform sampling (NUS) ADC in each channel. 
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Figure 8: Analog vector-matrix multiplier (AVMM) takes random linear combinations of M input signals to produce 
M output signals. This equalizes energy across channels. The random LTI Liters convolve the signals with a diverse 
waveform that results in dispersion of signals across time. The resultant signals are then sampled uniformly at rate 
Q using the random demodulator in each channel. 

The sampling architectures shown in Figure [7| and [8] preprocess the signals in analog with an analog-vector- 
matrix multiplier that spreads energy across channels. The analog ensemble is then processed by a bank of 
random filters that spread the energy over time. The combined action of the AVMM with a random matrix 
A and the analog LTI filters with a random matrix H forces the processed output X p to be incoherent 
w.h.p. The incoherent signals are then either sampled randomly with an NUS ADC in each channel, as in 
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Architecture 1, or sampled uniformly using a modulator, an integrator, and a uniform ADC in each channel, 
as in Architecture2. 

The AVMM takes the random linear combination of M input signals to produce M output signals. The 
action of the AVMM can be modeled by left multiplication of random matrix A E R MxM with ensemble X, 
which then equalizes w.h.p., the energy in each of the channels regardless of the initial energy distribution. 
Furthermore, the all pass LTI filters convolve the signals with a diverse impulse response h c (t), which 
disperses signal energy over time w.h.p. (see Lemma flj) . We will use the same random LTI filter h c (t ) in 
each channel. The action of the random convolution [33 of h c (t) with each signal in the ensemble can be 
modeled by the right multiplication of a circulant random orthogonal matrix H E with X, assuming 

W is even; it will be clear how to extend the argument to W odd. We can write H = WQ *, where 


Q[n,(j\ = < 


_2_ cos ( 2nujn \ 

yw cos v w ) 
1 (—D k ~i 

vWsinET) 


uj = 0 

w = [1, f - 1] 

,, _ w 

id 2 

w = [f + 1,W-1] 


W[n,u>\ 


z 0 

VW 

2 

Vw 


cos 


( 2num 

V W 


z W/2( i\k-l 

Vw ^ 

2 sin f 27 Turn 


+ oJ) 

+ ®uj) , 


ud = 0 

w = [i,f-i] 

,, _ W 
W — 2 

w= [f + 1,W-1] 


(29) 


(30) 


and zo,z w /2 = ±1 with equal probability and 0 U for id = l,...,W/2 — 1 are uniform on [0,27r] and all 
W/ 2 + 1 of these random variables are independent. 

Application of the AVMM with random orthogonal A and the LTI random filters with random orthogonal 
H on input ensemble X spreads signals out across channels and over time w.h.p. As a result, we obtain 
I p ER Mxr 

Xp = AXH t = AU'£V t H t . (31) 

Let Up = AU and V p = iTV, where U p E R MxR , V p E be the left and right singular vectors of 

matrix X p , respectively. Note that matrix Xp is an isometry with X and has the same rank as X. The 
left and right singular vectors U p and Vp of Xp are in some sense random orthogonal matrices and hence, 
incoherent w.h.p. The following Lemma shows the incoherence of matrix Xp. 

Lemma 1. Fix matrices U E R MxR and V E C WxR of the left and right singular vectors, respectively, each 
of which consists of R orthogonal unit norm columns. Create random orthonormal matrices A E R MxM and 
HeR WxW . Then 


• maxi<^<M HE/peill! < Cp max (R, \ogM)/M with a probability exceeding 1 — M ~^. 

• maxi<j<w ||^p e j ||2 ^ C/3 max (_R, log W)/W with a probability exceeding 1 — W ~^. 

• maxi<i<M (Up V!, e^e*) 2 < CplogW max (R, log M) /MW with a probability exceeding 1 — 0(W~& + 

i <j<w 

M~P). 

• maxi<^<M e^e/c) 2 < Cp log W max (R, log M)/MQ with a probability exceeding 1—0 + 

1 <j<Q 3 

M~P). 


Proof of Lemma [l] is presented in Section [4j 
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2.5.1 Sufficient sampling rate for the first uniform sampling architecture 


Lemma [I] combined with the definition © shows that the coherence parameter Hq < Cp log (W) holds for 
for R > log M with high probability. Using this bound in the matrix-completion results m in the noiseless 
case asserts that if MQ > CR(W + M)log 3 (fU), the solution of the nuclear norm minimization program 
(11) exactly equals X with high probability. We are paying an extra log factor in the measurements but 
now there is no dependence on the energy distribution of the ensemble. 


2.5.2 Sufficient sampling rate for the second uniform sampling architecture 

Combining Lemma [I] with Theorem [l] immediately provides with the following corollary. 

Corollary 1 . Suppose U measurements of the ensemble Xq are taken through the uniform random demod¬ 
ulator setup. If 

n > C0R ma x{W/M, 1) log 4 (WM) (32) 

for some /3 > l, and R > logM, the minimizer X to the problem is unique and equal to Xq with 
probability at least 1 — 0{WM )~ 13 . 

Hence, we can recover X and hence, X in both uniform sampling architectures in Figure [ 7 J and [i] using the 
nuclear-norm minimization. 


3 Numerical Experiments 


In this section, we study the performance of the proposed sampling architectures with some numerical 
experiments. Since the first sampling architecture reduces to an already well-studied matrix completion 
problem, we have chosen here to present only the sampling performance of Architecture 2, which reduces to 
a matrix recovery problem from a block-diagonal measuremnt matrix that has not been studied before. 


3.1 Sampling performance 

In all of the experiments in this section, we generate the unknown rank-R matrix Xq by the multiplication 
of a tall M x R, and a fat R x W Gaussian matrices. Our objective is to recover a batch of M = 100 signals, 
with W = 1024 samples taken in a given window of time using Architecture 2 . We will use the following 
parameters to evaluate the performance of the sampling architecture: 

^ v r MQ 

Oversampling factor : n = —— -—-—, 

F 5 1 R(W + M-RY 

where the oversampling factor is the ratio between the combined sampling rate of all the ADCs in Figure [ 6 j 
and the degrees of freedom in rank-R matrix of samples Xq. The successful reconstruction is declared when 
the relative error obeys 

Relative error := < 10 -2 . 

The first experiment shows a graph, in Figure [9(a)| between the oversampling factor 77 , and R. Each point, 
marked with a black dot, represents the minimum sampling rate required for the successful reconstruction 
of a given value of rank R. The empirical probability of success for each point is 0.99. The empirical 
probability is computed over 100 iterations with a new instance of randomly generated Xq in each iteration. 
The red line shows the least-squares fit of the black points. It is clear from the plot that the for reasonably 
large values of R, the sampling rate is within a small constant of the optimal rate R(W + M — R). In 
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context of the application, and under the narrow-band assumption described in Section |1.2[ the graph in 
Figure |9(b)| shows that for a fixed number of sources R = 10, the sufficient sampling rate required for 
the successful reconstruction of the ensemble decreases inversely with increasing number M of the receiving 
antennas. Each black point gives the minimum sampling rate required for the successful reconstruction with 
probability 0.99. The red line is the least-squares fit of these marked points. In other words, Figure |9(b)| 
illustrates the relationship between the number of ADCs, or receiving antennas M, and the sampling rate 
of each of the ADC for a fixed number of sources R = 10. The important point is that as we increase the 
number of antennas the the sampling burden on each of the ADCs decreases. 


Number of source: R = 10 




(a) (b) 

Figure 9: Performance of the random demodulator for multiple signals lying in a subspace. In these ex¬ 
periments, we take an ensemble of 100 signals, each bandlimited to 512Hz. The probability of success is 
computed over 100 iterations, (a) Oversampling factor r] as a function of the number R of underlying in¬ 
dependent signals. The blue line is the least-squares fit of the data points, (b) Sampling rate versus the 
number M of recieving antennas. The blue line is the least-sqaures fit of the data points. 


3.2 Stable recovery 


In the second set of experiments, we study the performance of the the recovery algorithm when the measure¬ 
ments are contaminated with noise as in (12). The noise vector is standard Gaussian, i.e., £ ~ A/"(0, cr 2 J). 
We select 5 < a(L + y/L) 1 / 2 ; a natural choice as the condition ||£||2 < S holds with high probability. In the 
first set of experiments shown in Figure 10, we solve the optimization program in (13). The plot in Figure 
1 10(a) | shows the relationship between the signal-to-noise ratio (SNR): 


SNR(dB) = 10 log 


l o||f 


\\m 


and the realtive error(dB): 


Relative error (dB) = 10 log 


( ||X-X 0 ||| \ 

V ll^olll ) 


for a fixed oversampling factor r] = 3.5. The result shows that the relative error degrades gracefully with 
decreasing SNR. In the Figure (10(b) | the plot depicts relative error as a function of the oversampling factor 
for a fixed SNR = 40dB. The relative error decrease with increasing sampling rate. 
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Figure 10: Recovery using matix Lasso in the presence of noise. The input ensemble to the simulated random 
demodulator consists of 100 signals, each bandlimited to 512Hz with number R = 15 of latent independent 
signals.(a) The SNR in dB versus the relative error in dB. The oversampling factor r] = 3.5. (b) Relative 
error as a function of the sampling rate. The SNR is fixed at 40dB. 

4 Proof of Lemma [T] 


We start with the proof of Lemma [l] 


Proof. The point (1) is the standard result [43]. We give proof of (2) now. It is a fact that in (29) and 
(30) for fixed a and 6 ~ UniformQO, 27t]), the random variables sign(cos(a + 9)) and sign(sin(a + 6)) are 
independent of one another. Thus H has the same probability distribution as WZQ *, where Z = diag(z) 
and the entries of z are i.i.d ±1 random variables. In light of this, we will replace H with WZQ*. For a 
fixed k, we can write 


V*e k = V*H*e k = V* Zw k 
w 

= ^2 z(u>)w k {u)v u 
00= 1 


where V = Q*V and w k = W*e k and v u = V e w is the cjth row of V. We will apply the following 
concentration inequality, 

Theorem 3. Let r] G M n be a vector whose entries are independent random variables with \rj(i)\ < l, 

and let S be a fixed m x n matrix. Then for every t > 0 

P{\\Sr 1 \\ 2 >E\\Sr 1 \\ 2 + t}<2e- t2 ' 16 ^ 2 , 

where 

E ||<S , j 7 || 2 < ||S|| F . 


/V >lc 

We can apply the above theorem with S = V W k , where W k = diag(iu/ c ), and rj = z. In this case, we 
have 




w 

EKMI 2 ll«-ll2 

00=1 



w 

X>j 2 

00=1 
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and 



Thus, 


and 


v w k w<\ —\\v w = \ 


P \\\V*e k \\ 2 > ^ 2 ^+t x /4- r \ <2e~ t2 / 16 , 


pi max \\V*e k \\ 2> <2We~ t2 / 16 . 

i<k<w V W V W 


We can make this probability less than W & by taking t > CVlog W, and (2) follows. 


Now for (3), we can write H = WZQ* . Let W£ be the £th column of W* and let u* k be the fcth row of U. 

— — 

For a fixed row index k and column index we can write an entry of UV as 


(uv*^j (M) = U{WZQ*V)* 

= UQ* zw* 

= (QU k)*Zw( 
w 

= ^2(Qu k {u))* z(u)w e (u), 

OJ = l 


where Q = Q*V is a tall orthonormal matrix. Since the z(u) are i.i.d. random variables, a standard 
applications of the Hoeffding inequality tells us that 


{|(t/V*) (M)| > a} < 2e- x2/2a \ 


where 


a 2 = 


w 

= ^2 | (Qu k (u 


00= 1 


M")r 


2 \\Qu k 


< 


w 

2 \\Uk II2 

w 


Thus, with probability exceeding 1 — 2 W ^ 


max 
l<k<M 
1 <£<W 


(UV*) (M) 


2 

< 


4(/? + 2) log W 
W 


max 

1 <£<M 


|| u k 


2 

2 ' 


The point (1) tells us that 


max 
1 <k<M 


M~ 1,2 / ^ max (R, log M) 

IKII 2 < C {S - T7 - 


with probability exceeding 1 


M &. Thus (3) holds with probability exceeding 1 — 0(W 13 + M ^). 


□ 
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5 Proof of Theorem Q] 

Define a subspace T C associated with X$ with svd ^2^=1 ^kUkV^: 

T = {X : X = UZ\ + Z 2 V*, Zi g R WxR , Z 2 G r M xR }. 

The orthogonal projection of Vt onto T is 

Vt(Z) = uu*z + zvv* - uu*zvv*, 

and its orthogonal complement 

V T ± (Z) = (1- Pt)(X) = (I M - UU*)X(I W - V V*). 

A sufficient condition for the uniqueness of the minimizer to is given by the following Proposition [3||4] . 

Proposition 1. The matrix X is the unique minimizer to if3Y G Range(A*) such that 

\\r T Az)\\* - II uv* - v t (y)\\ f \\v t (z)\\ f - \\p T ±(y)\\\\p T ±(z)\u > o, 

for all Z G Null(A). 


Proposition [l] implies that the uniqueness of the minimizer is gauranteed, if 3Y G Range(*4*), such that 

\\Pt(Y)-UV*\\ f < 


9 W’ - 2 ’ 


holds. Also for Z ^ 0, and VZ E Null(*4) the following 


O 


\\r T± (z)\\ F >\/ w \\v T (z)\\ F 


is true. To show (34), for Z G Null (.A) 


0=M(Z)||f 

0>\\A(Vt(Z))\\ f -\\A(V t ±(Z))\\ f , 


which after using the fact that ||«4.|| = \fW/il implies that 


\\A(V t ,(Z))\\ f <^\\V£(Z)\\ f . 

In addition, for an arbitrary Z, we have 

\\A(Vt(Z))\\1 = (A{Vt{Z)),A(Vt{Z))) 

= (. Z,V t A*APt(Z )} 

> (1 - \\P t AA*Vt - Vt\\)\\Vt{Z) 

>\\Wt(z)\\ 2 f , 


(33) 


(34) 


(35) 


(36) 


where the last inequality is obtained by plugging in \\VtAA*Vt ~ Pt\\ < \-> which is true with probability at 
least 1 — 0(WM)~P by the application of Corollary^ using Q > Cf3R(ii\(W/M) + /i 2 ) log 2 (VPM). Collecting 
the facts in (1351) , and (|36|, the result in (1341) is obtained. 
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5.1 Measurements as a matrix trace inner product 

The (i,j)th sample taken by the ADC in the i-th branch in Figure [6] will be expressed using trace inner 
product as 


y ij = (A zj ,Xo)=tr(A* j X 0 )= ]T di[k]X 0 [i, k], (i,j) = {1,..., M} x {1,..., SI}, (37) 

keBj 


where the sampling mask A^ is 

Aij = y] di[k]eie* k . (38) 

k~Bj 

where and {ek}i<k<w are standard basis vectors of dimension M, and W , respectively. It 

follows that 

A*A(X) = ^2(A ij ,X)A ij , 

(hi) 

and 

A* A = ^2 A ij ® A^, (39) 

(hi) 

where (8) denotes the tensor product. It is clear that the measurement matrices are rank-1 random 
matrices. In general, the tensor product of rank-1 matrices U\v\, u 2 v\ with Ui G M m , and Vi G M, w is given 
by the big matrix 


u\v\ ( 8 ) u 2 v 2 = 


IXl[l]*iX 2 [l]WlW2 

u 1 [2Yu 2 [l]v 1 vl 


u 1 [1Yu 2 [2}v 1 v* 2 


Ul[N]*U 2 [l]viV 2 


and we will denote ( a , /3)th, W x W submatrix by 


u 1 [1]*u 2 [N]v 1 v* 2 ' 
ui[i]*u 2 [N]v 1 v * 2 


{«lV* ®U 2 V* 2 }( a ,p) = U 1 [a\*U 2 [/3}v 1 V2. 

Using the above notation, we can write 

{A^ ® = ei[a]ei[P] ^ di[k]di[k']e k e* k ,. (40) 

k,k'~Bj 

Taking the expectation, we can see that 

{E {A^® A^)}^ = e i [a]e i [/3] ^ e k e* k , 

k~Bj 

and using the fact that = 1 when a = /3 and is zero otherwise, we can see that 

^ E (A^ (8) A ^) = Iwm , 

(hi) 

where IVm denotes WM x WM identity matrix. In operator notation, we have E7l*7l = X. 

Let {'Uj*}i</c<M, and {v^}i<k<w denote the rows of the matrices 17, and V, respectively. The following 
quantity will be used repeatedly in the theoretical analysis 

\\V T (Aij)\\p = {P T (A^), A^ ) 

= (U*A^,U*A^) + (AijV,AijV) - (U*AijV,U*A i3 V) 

= \\U*Aij\\l + \\AijV\\l - \\U*AijV\\ 2 F < \\U*Aij\\l + \\AijV\\l. 
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Using the definition (38), we have 


\U*A tJ \\ 2 F = 


Y di w 


u t e k 


k~B, 


l i\\2 


^ ^ di [k] 


k~B, 


^ W 

- Mi m ' u ’ 


and 


WAijVWl = 

Y di[k]eiV* k 

to 

II 

w 

to to 

Y d ii k \ V k 


k~Bj 

F 

k~Bj 


This implies that 


\\V T {A l0 )\\l<ti 


R(W/M) 

n 


2 


^2 d i[ k ] V k 

k~Bj 


(41) 


5.2 Golfing scheme for the random modulator 


We start with partitioning the measurements indexed by the set 

r = {(i,i)}i<KM 
i<j<o 

into ft disjoint partitions {T ^ of size \Tk\ = T/ft, such that IJ^T/c = G he., K \^k\ = MU. We will 
construct the dual certificate Y G Range (A*) iteratively using Gross’s golfing scheme. Let A& denote the 
operator corresponding to the samples taken in the kth partition, i.e., 

A k Ak — ^ ^ A-ij 0 A-ij. 

(ij)e r k 

As will be clear later in the proof that we want the partitioned linear operator K,A k Ak to be a close approx¬ 
imation of the X. For this purpose, each of the partition T^ is chosen uniformly at random out of the set T. 
Suppose now that we form a new set of partitions {T' k } defined as 

r'fc = {(;, j) G {1,..., M} x {1,... 3 U} : S (itj) = 1}, (42) 

where the sequence {$(i,j)}i<i<M are independent 0/1 Bernoulli random variables with 

i <j<n 

In the the proofs later, we will be interested in bounding events rj(Tk) that involve sum of independent 
random matrices indexed by the partitions {T*, }!<&<«, e.g., define 


v(Xk) := 


^ ^ ft-A^j 0 A-ij X 

(hj)e r fc 


and we want to bound the probability P {r)(Tk) > e}. Uisng the fact that 


P{f l {T k )>e}<2P{f l (T , k )>e}, 


(43) 


which implies that probability of an event {77 (T/-) > e} over the set T& can be bounded by the probability of 
a similar event {77 (T' k ) > e } over the set T' k . As a result, we will now be concerned with only bounding the 
probability of events of interest over the sets T' k . Thus, we redefine A k Ak over T' k as 


AfcAk ^ ^ A%j 0 Aij ^ ^ 5(ij)Aij 0 A{j. 

r f k ( i,j) 


( 44 ) 


23 


Draft by A. Ahmed and J. Romberg - January 28, 2015 - 1:24 




























The iterative construction of the dual certificate is: 


Y k = Y k _, - KA%Ak (Vt(Y k-i) ~ UV*). 

Projecting on the subspace T on both sides results in 

r T (Y k ) = Vt{Y fc-i) - KV T A* k A k (VT(Y k ^) - UV*), 
where it is importatnt to see that Y k G Range(*4*). Now let 

W k : = V T (Y k )-UV\ (45) 


which gives 


As a result, 


W k = W k -i - KP T A%A k P T (W k -1 ). 


\\W k \\ F < \\KV T AiA k V T -VtWWW^Wf, 
and by Lemma [2] with Cl > C(3 kR(ii\{W / M) + /i|) log 2 (HEM), it follows that 

l|W„|| p <Q) IIvv*|| f 

= 2 - K VR < when « > 0.51og 2 . (46) 


which holds with probability 1 — ( WM ) P. In view of the coherences of W o = —UV* with ||C7V F *||p = R 
defined in (21), the coherence fi\ k is related to the Frobenius norm of W k as 


max 

i>3 




fellF 


k~Bi 


MCI' 


(47) 


Note that we have replaced R in the definition (15) with ||Wfc||p for proper normalization. Lemma [Z] shows 
that under appropriate conditions, the conclusion /i| k < fc _ 1 holds with high probability. This implies 
that 

< M3 (48) 

is true and this fact will be used towards the end of this proof. The iterative dual certificate 


Y = Y K = -J2^lM^Yk- i) 

k=l 


satisfies (33). To show that \\Vt-^{Y K )\\ < \ holds given Lemma |3j and (48), we make the following 


calculation 


\\V t ±(Y k )\\ < II^T-MfeA(^-i))ll = E ll?V (KA* k A k (W k - 1) - w*_i)|| 

k= 1 k=l 

< E (KA k MW k -!) - w k - 1) II < E 2_fe_1 < !/2, 

k= 1 k= 1 

which holds given Cl > C/3^R(W/M)/j^maix(W/M 1 1) log 2 (lFM), the result holds with probabiltiy at least 
1 - (WM)-?. 
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5.3 Lemmas for Theorem Q] 


We state here the key Lemmas required to prove sampling Theorem^ 


Lemma 2. Suppose f l measurements are taken through the random demodulator using the setup in (18). 


Let A%Ak ; defined in (44), be the kth partition of A*A indexed by IV, defined in (42). Then for all /3 > 1, 


\\KV T AlA k V T — Vt\\ < ^ 
provided D > C/3nR{/i\{W / M) + /i|) log 2 (WM) with probability at least 1 — (WM) 


-P 


Corollary 2. Suppose Ll measurements are taken through the random demodulator using the setup in (18) 


Let A* A be as defined in (39). Then for all /3 > 1 


\\VtA* AVt —Vt 11 — 2 

provided D > CPR{fi\{W/M ) + /i|) log 2 (WM) with probability at least 1 — (WM)~P. 

Proof. Proof of the corollary follows from the proof of Lemma [2] without partitioning, i.e., n = 1. 


□ 


Lemma 3. Suppose Q entries are observed using the random demodulator, as in (18). Let W k - i be a fixed 


M x W matrix defined in (45). Then for all /3 > 1, 

HMV4-2) (w F fe_ 1 )|| < 2 _/c_1 

with probability at least 1 — (WM)- 13 provided Ll > CfSn/j^ k _ 1 Rmax(W/M , 1) log 3 ^ 2 (WM). 
Lemma 4. Let /Xg 6e the coherence of the iterates as defined in ( |47| ). Then 

2 /I 2 

T3,k — 2^3,/c-1 

holds when TL > C ft nR(fi\(W/ M) + /x|) log(VEM) for /3 > 1 with probability at least 1 — (WM) 


-P 


5.4 Matrix Bernstein-type inequality 


We will use a specialized version of the matrix Bernstein-type inequality 4l||45 to bound the operator norm 
of the random matrices in this paper. The version of Bernstein listed below depends on the Orlicz norms 
||Z||^, a , a > 1 of a matrix Z. The Orlicz norms are deined as 


\Z\\^ a = inf^ > 0 : Eexp(- 


-) < 2 }, > 1 . 


(49) 

Suppose that, for some constant U a > 0, ||Z g ||^ Q < U( a p q = 1,..., Q then the following proposition holds. 

Proposition 2. Let Z\, Z 2 , ..., Zq be iid random matrices with dimensions M x N that satisfy E (Z q ) = 0. 
Suppose that ||Z[|^ a < oc for some a > 1. Define 


Gz — max • 


E( EZ « Z P 

q= 1 


1/2 


Q 


q= 1 


1/2 ' 


(50) 


Then 3 a constant C > 0 such that , for all t > 0, with probability at least 1 — e 

r2 


—t 


1^1 


M < C max |a zy /i + log(M + iV), U a log 1/a (f + log(M + JV)) J (51) 
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6 Proof of Lemmas for Theorem Q] 

6.1 Proof of Lemma [2] 

We want to bound the quantity 


VFk) || KPTA%AkPT ~ V T \\ — 


^ KV T {A i:j ) (g) Vr{Aij) — Ft 

(i,j)e F fc 


We are interested in the failure probability of the event F(T k ) := {r]{T k ) > (}. From (43), it is clear that 




where the set T' k , as defined in (42), is the partition generated using the Bernoulli model. Hence, it is enough 
to bound the operator norm 


'E K’PrfVj) 0 Vr(Aij) — Pr 

= 

^ K$(i,j)FT{Aij) (8) Ft{A ^) — TV 

(O)en 


(hi) 


Now the fact that the operator kFtA^A^Ft does not deviate from its expected value 

E {^VtAXAuPt) = kTt E Aij <8> A^Ft 

(hi)er' 

= kVt ^ ^ E ^(ij) E {A^ <g> Aij)VT = Ft E A*AFt = Ft 
(hi) 

in the spectral norm can be proven using the matrix Bernstein Inequality. To proceed define the operator 
Cij which maps Z to {Ft{A^), Z) Ft{A^), i.e., Cij = Ft {Aij) (8) Ft {Aij). This operator is rank one, 
therefore, the operator norm ||£*j|| = || / Pt(^-zj)IIf- Let 

Zij — ^^(hi)^D E Cij 

We also have 

E * 2 e - E£ *i) 2 = E « 2 [ E (%.i)4) - ( E %j)A,) 2 ] 

(hi) (hi) 

= E r2 e (%.j) a,) - E « 2 (e 

hi hi 

Because E C \-, and (E£^) 2 are symmetric, positive-semidefinite matrices, it follows that 


m n 

y! y! L 5(ij) (Tij — E Cij ) 

< 

y! EE Pt^dOI F ^3 

i=l i=l 


hi 


EE||Pr(A y )|££i 

hi 


Using the facts Cij = Ft {Aij) (8) Ft {Aij), J2ij E Cij = Ft, and expanding further gives 


Eyi|p r (4 lj )n 2 4- 

< 

E 

fv „2 R (W/ M ) r 

/ v Ti ^ ^i T 


2 i 

£ij 


hi 



^ hi 

k~Bj 

2 / 
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Use the notation 


then 


Pij 


£ d A k ] V k 


k~B n 


E^WPAA^WlCij 

^ 2 R(W/M) 

- Ml a 

y^ e 

+ 

e £4a,- 



i,3 


hi 


oR(W/M) 


e £4a 


i,3 


( 52 ) 


The second term in (52) can be simplified as 




'Pt'Ej'^2 (Pij( A ij ® A,)) 

i,3 





< 

e £ (/>?,■( Aj ® Aj)) 

hi 

5 


(53) 


where the last line follows form the fact that \[Pt\\ < 1. Using the definition of p?-, and (40), it is easy to 
see that the W xW submatrix at (<a, /3 )~th location is given by 


{E (p i j(A i j 0 ei[a]ei 


The following identity is very useful 


£ IKII 2 £ e e e* e + 2{vk,v k ')e k e* k , 

k^k' ~Bj 


i^Bn 


(54) 


(A-ij , A{> j /) — 0, 

which holds true when either i 7 ^ i', or/and j 7 ^ j'. Given this fact, we have 


| V" Aij 0 A^|| = max ||A^ 0 A* 

* ^ 7.4 




Using this fact, we can write 


E (p^- 0 A^-)) 


= max ||E (p?-(Aij 0 A* 




Using (|54|), we obtain 

£ E (dj(A? ® A/)) 


h3 


< 


£ IKIli £ m; 
< £ IK 


£ (v k ,v k ,)e k e* k , 


k~B, 


t-Bj 


k^k'^ 

'Bj 

£ 

+ 2IKII1 

£ 

t~Bj 



k^k' ^Bj 


= £ IKIl2 + 2^|Kll2<3^§, 

k^B, 


( 55 ) 
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where the second inequality follows from the application of Cauchy-Schwartz inequality in the second term 
of the R.H.S., the last equality is the result of the facts that 


51 ~ e 

— 1, and 

51 e* k , 

t~B 3 


k^k'^Bj 


and the last inequality follows form the definition of the coherence in (20). Plugging (55) in (52), we have 
the bound 


E 

< K 

e 51II7 , t(^)IIf Aj 

hi 


hi 


< Ck 


r^eM±jAi 

n 


(56) 


Finally, we calculate the Orlicz norm, the last ingredient to obtain the Bernstein bound. First, it is important 
to see that 

\\ZijW < K\\Cij -ECijW < 2K\\Cij\\ = 2||£y|| F = 2K\\VT(Aij)\\l, 
where the second-last equality follows form the fact that Cij is the rank-1 operator. Using the last equation, 


and (41), we have 


Ui \\Zij ||^ < 2k 




r=l \ 7 


< 


c^r^IMI + ck ]T fell! 


k~Bj 

< c^kr^-^ + c f 4 K ^. 


(57) 


Using the notation A = + //J, we obtain 

'MQ ■ ul 


A 


lh log ( ) = CkR- log(nRMA). 

(J 7 / u u 


Plugging ( [56] ) , and (57), and using t = /3log(WM) in the non-commutative Bernstein’s Inequality in Propo¬ 
sition^ we have 


M Q 

51 Zi 3 

i =1 j=l 


< 2 max{log(tUM), kR^ log(/ti?MA)(^log(tUM))} 


The claim follwis by taking t = /31og(WM), and the fact that RMA < VFM, and U > C{fi\{W/M ) 
fi^) kR/ 3 log 2 (WM) with probabiliy at least 1 — (WM)~P. 


6.2 Proof of Lemma [3] 

We will use the Bernstein bound in Proposition [2] to prove this Lemma. We want to bound 

\\{KAlA k -T)(W k _ 1 )\\ = 


(ij)er k 


{-A-ij , W k—i)Aij E ^ ^ n^Wk-i)^ 
(i,i)er fc 
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which follows from 


Wk—i E ^ ^ W 

(i,j)er k 

Using the reasoning similar to that in Lemma [2j it is clear that bounding the following sum of random 


matrices over T' k matrices 


^ ^ ft{Aij, Wk-i)Aij E ^ ^ KiAijtWk-JAi' 

(hi)£b k 

suffices. Define a zero-mean random variable as follows: 

Zij = {Aij, Wk-i)Aij Kj E (Aij , W^_i) A^j . 

The first variance term in (l50l) is 


E E z « z « = E E s «,i) 

(hi) (hi) 

- E « 2 (E%, i} ) 2 E ((Aij, W k -i)Aij) E ((Aij, Wk-i)Aij)*, 


h3 


where 

(A ij ,W k _ 1 ) = E di[y]W k -i[i,j]. 

The following can be easily verified and will be used in the proof of this Lemma 


E E %T Z b z t 

< 

E « 2 E<5 (iij) E(A y , Wk-tfAijA'j 

(hi) 


(hi) 



= 

EwEiAy.Wfe-!) 2 ^^ 




(o) 



In the calculations below, we assemble the ingredients to calculate the variance. First by (38), we have 


(Ai^W^fAijA*! = (A ij ,W k - 1 ) 2 e i e* E ^[fcR[fc']e£e^ 

7,7 ' 

= {A ij ,W k - 1 ) 2 e i e* E di[k} 2 \\e k \\ 2 2 

7 

w 
~n 


= —(A ij ,W k „ t ) 2 e i e*. 


Taking summation over i, and j on both sides and using above relation gives us 


W 

~n 


EE(A ij W fc _i) 2 e i e* 

< 

M 

E>^* 

(hi) 


i=i 


w 


max 


-E E {AijWk-i? 


3 = 1 


= ^ m f x E E w k- ih7]- 

i=i 7~^j 


This gives the first term in the variance (50) 


E EZ « Z .* 

(hi) 


<«|EE Wfc-iM<l|W fc _ 


n 




i=l 7 ~jSj 


_ (W/M) 

IT 




( 58 ) 
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where the last inequality follows from (19). The second variance term in (50) is 


E E Z'ijZij = E « 2E %,i) E {A ij ,W k . 1 ) 2 A* ij A ij - 

(hi) (hi) 

hi 


which implies that 


We begin with 


E EZ ^ 

< K 


(hi) 


(hi) 


(A ii ,Ty fc _ 1 ) 2 A* i A ii = (A ii ,W fc _ 1 ) 2 || ei ||i E ^[7]^[7']e 7 e;,. 

7,7'~23j 

The expectation of the operand on the right hand side gives 

E(A ii ,W fe _ 1 ) 2 A7A* i = (^diWWk-xii^}) ( E di[7]diMc 7 e 

V 7 / \7 

= E Wfc-iM E <M4 + 2 E ^-l^7]^-i[*,7']e 7 ey- 

7 7~Sj 7,7 f ^13j 

Next step is to take the operator norm, and summation over j on both sides. The orthogonality of 
{e 7 }i< 7 <w? and Sj D £y = 0 implies that 


Ee^^m) 2 ^^. 

i=i 


< 


< 


< max 
i 


n 



n 

E E Wfc-Ri.'y] E ~^~ e * 

+ 2 

E E W fc _i[* ) 7]W fc _ 1 [»,y]e 7 e 

i=l 7~#j 7~#j 



i=l 7,7 

E whiM E 

+ 2 max 

E W fc _ 1 [*,7]W fc _ 1 [*,7 , ]e 7 e;, 

7~#j 7 


j 

7,7'~23j 


Now using the fact that 


7,7 fr ^13j 


Summing over i, we obtain the second variance 

M 


e 7 ey 


= E wihiM 

7~23j 


E EZ « Z 

(hi) 




—^ max E "tiM 

z=l 7 ^Bj 
M 


< 3«E ll^fe-illp^h-i^ = 


— 1 IlK- 


(59) 


Z— 1 
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Given ( |58| ), and ( |59[ |, we can obtain the variance using ( |50| ). We now calculate the V ’2 norm of matrix 
(Aij,W k -i)Aij - EiAi^Wk-^A ij. Since (Aij, W k _i)Aij is a rank one matrix, this gives 

II (Aij, Wk—i)Aij — E(A^j, Wk-i)Aij ||- 0 2 < 2 \\(Aij,Wk-^AijW^ 

— 2||Aij || || (A^ , Wk-i) ||'02 

W 

= 2-\\(A ij ., w k _ 1 )U 2 . 


This gives the result 


U~2 max k || < ^(i,j) ( Aij , W k —\j A{j 5^ij^(Aij, W k — i)A ^||^ ;2 

ij 

< CK J[ m $ x W k- 1[*>7] = C K ^^- ^i,k-i\\W k -i\\l, 

1~ B 3 


and hence 


(60) 


f/ 2 iog 1/2 , x IIw fe _r|| F iog 1 / 2 (WM). 

The results in ( |58| ), ( |59| ), and (60) can be plugged in Proposition [2] to obtain 

P^(W fc _ 1 )-W fc _i||p< 

C«||WVil| F max | y ^ 3 ’^ 1 m ^ (W " /M T V/?log(^) ; ^ U 3 ’ fc ~^f ^/Uog 3 / 2 (WM) 


(61) 


with t = /3 log(TUM), which holds with probability at least 1 — (WM) ^. Using (46), it becomes clear that the 
right hand side can be controlled with high probability by selecting Q > C /? k _ 1 Rmdix(W/M, 1) log (WM). 


6.3 Proof of Lemma [4] 

Coherence of iterates W k was defined earlier in ( |47| ) . This Lemma is dedicated to showing that the coherence 
of the iterates is bounded. We start with matrix Wj , c and its coherences fi\ k and fi\ k . The iterates are 
related as 

= ^V T AlA k V T ~ V T )(W k _ i). 

Since W k G T, we start with writing out 

= KPrAlAkiWk-!) - W k _! 

= ^ K (Aij, W k -i)Vr{Aij) — Wk-t 

(hi)^r k 


The coherence fi\ k of W k is then 
o MQ 


l 118 ^ > I ^ V ft(Aij, W fc-l)[?T (Aij)] ( m?n ) [Wfe—l](m, 


^ 3,/c 1 <m<M 

i<a;<o \b,j)er fc 

We sart with bounding the following quantity with high probability 


n) 


(62) 


^7mn(r/c) • — 


^ ^ — —l](m,n) 

(*,i)er fc 


(63) 
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using the scalar Bernstein bound. Instead of bounding {? 7 mn (r/ c ) > e}, we will bound the event {rjmniT^) > 
e}, where the set T' k is selected using Bernoulli model, i.e., we will bound the following quantity 


Vmn (r 


k) 


^ ^ (-A-ij i k— 1) \Pt (-A-ij )] (jn,n) l] (m,n) 

(ij)e r fe 


Let 

Zij ? j) {Aij , TV^ — i) ( m?n ) [Wfc-l] (m| n). (^4) 

For this purpose, we need to calculate the variance 

E ^ E « 2 EJ W )(Ay, W fc _i) 2 [^T(^)]^, n) 

(hi)er', (hi) 


PMAy)] (m , n) = [C7l7*Ay] (m , n) + [Ay VV*] (m ,«) - [C/C/*Ay W] 


It follows that 


[^T(Ay)]f m , n) < 3 ([E/CTAy]? min) + [Ay VV]^ n) + [C/C/*Ay VV*]f m>n) ) . 

Using Lemma |5j we now calculate the variance term by term. The first term in the variance is 


(65) 


m n 

^E[C/C/*Ay]2 m „ ) (Ay,^_ 1 ) 2 <3^( Wm ,u i > 2 -E E^.en) 2 -max E W*-iM 

(i,j) *=1 J=1 7~Sj J 

<3|M^|,*-i^> (66) 

where the first inequality follows by the application of Lemma [6j For the second inequality, we have used 
the the definition of coherence in (21), and the fact that 


E E (®7.e«) 2 = 1 

j =1 7 

for any given index (m, n). Using Lemma [5j and [6] again, we obtain 


E E[Ay W*]( m) „ } (Ay, Wfc-i) 2 < 3E E • E <«7»«n) 2 I 

(hi) (hi) \7~#j 7~#j 

O M 

= 3max E Wfc-iM] • E E ( v ~n v n) 2 ’ E^ m ’ ei ) 2 


7~#j 


j=l 7 ~# J - 


i=l 


R 


< 3max E Wfc-iMIKIIi • l|e m ||i < 3 ^3,k-i ^IKHl- 


7 


Similarly, the last variance term is 


yE[i/c/*A i3 vr]^ n) (Ay > ^_ 1 ) 2 < 

(hi) 

m n 

<3E(«n.«i) 2 -E E (v 7 , r n ) 2 - max £ 

»=1 i=l7~Bj 7~Sj 

< 3||« ro ||i • [Kill 
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Using the fact that H^nlli < C we see that (67) dominates (68). Putting (66), (67), and (68) together, we 
have 


a Z ~ E ZjjZjj < C/c(||tA m || 2 + \\v n 11 2 )/^ 3 , k — l 

(hi) 


R 

M)' 


(69) 


Now, we need to calculate the Orlicz-1 norm max^- Using standard argumnets in probability theory, 

see 


46 , we can show that 


( \ 1/2 

||[t/t/ -^ij] (m,n) 11^2 — I ^ ' (67 i^n) I — 112 H'Wm || 2 

J 

where the first inequality follows from the fact that for a fixed (m, n) 

Y (e 7 ,e n ) 2 < 1, 

7 

and that {u m ,u z ) < ||tii|| 2 ||u m || 2 < ||«m||2, as ||-it;|| 2 < 1. Also, 

MijVV^m^hz < C(e m , Gi) ^ £ <«7»«n) 2 j < <7K|| 2 , 

the second inequality follows from the identity 

Y E ( v ~n v n) 2 = IKIll, 

j =1 7 

and that (e m ,e^) < 1 for a fixed (m, n). Similaraly, we can show that 

||[t7t7*A ii W*] (m , n) ||^ 2 < C'||u TO || 2 ||u n || 2 . 

In addition, as before, we have 

\ 1/2 

{A ij ,w k . 1 )U,<c\ Y 


Using Lemma [7| Equation (64), (65), we can compute \\Zij\\^ 1 as follows 

maxWZijWl 

IJ 

< Ck 2 max (\\[S(i,j)UU* + ||[% j) A ij VV*] (rn , n) ||$ a + \\[S {iJ) UU* A^VV*}^^) [K^, W fc _i)ll* 2 

l 3 

< Ck max (\\u n \\l + \\v m \\l) ■ ]T Wf_i[b7] 

11 * 

7~#i 

R 


ij 


- Ck (^ + ^w )^ k -1 


MCI' 


Then the Orlicz-norm term in the Bernstein bound is 

Clearly, the Orlicz-norm term dominates the variance term in the Bernstein bound. Select t = /3\og(WM ) 
in the Bernstein bound, which implies that 

\Vmn(T ' k )\ 2 < Cf3K(\\ Um \\ 2 + \\v n \\ 2 )^lk- 1 ^ 0 g 2 (WM) 
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< C/3k 


R(nl(W/M) + n 1) 2 R , 2 


n 




holds with probability at least 1 — ( WM ) $. The second inequality follows from the definitions in (19), and 
(20). Using the bound on |? 7 mn (rjj c )| 2 , we can find a bound on the coherence of the iterates in (62), which is 


/4,k (Mi'S +^§K fc -ilog 2 (^M) 




Select Q > C ft kR(h\(W/ M) + /i 2 ) log 2 (WM), we can arrange for a constant C such that 


^3,k — 9 M 3 , k-l 


1 


(70) 


holds with probability at least 1 — (WM) 


-P 


7 Auxiliary Lemmas for Theorem [T] 


Follwing Lemma will be useful in carying out several calculations in the proofs of the above lemmas. 


Lemma 5. Take Aij as defined in (38), and suppose U : M x R, and V : W x R are orthogonal matrices 
with {vi}i<i<w as rows, respectively. Then 


[UU*A tj }f mtn) = (u m , Ui ) 2 [ Y ^M(e 7 ,e n ) , 

[VV'AZj]^ = (e m , ei ) 2 [ Y diMv^vA , 

\7~Bj ) 


and 


[UU* 


AijVV’ 


} 2 (m,n) = (u m ,Ui ) 2 ( Y 




(71) 

(72) 


(73) 


Proof. We will use the definition of in ( [38] ) . Begin with 

[UU*A tj ] {min) = (UU*A ijiem e* n ) = e* m UU* Y 

7 

= Y dih^mUuie*e n = Y d^e^U Uie^e n 

7 ~Bj 7 

= (u m ,Ui) Y di[y}e*e n , 

7 


Second, 


[AijVV*]^) = (AijVV*, e m e* n ) = Y d^e^e^VV* e n 

7 

= (e m ,ei) Y dib\{v^,v n ). 

7 -B, 


Finally, 


[UVAijVV]^) = (UU*A^VV*, e m e* n ) 
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e* m UU* ^We«e; VV*e n 


= Y dih]e* m U Ui v;V*e,, 

7 ~Bj 


{Um.Ui) ^ d ih}( V 7’V r 


□ 


Lemma 6. Take Aij as defined in (38), and let x k , and y k denote scalars or vectors. Then 


E ( Y d ih}diW}x 7 x*, I I Y dil^dil^y^y*, ) = 

k 7,7 / ~23 J - J \7,7 fr ^&j 


= ( X] x kX * k J y ] + 2 Y x ^ x yy^y*y 

\l~Bj I \7 ) hi' 


Proof. The proof of this Lemma is simple involves expanding and moving the expectation inside. We will 
use the result of this Lemma in cases when x 1 , y 1 are both scalars and when one of these is a vector and 
other is a scalar. Furthermore, when both x 1 , and y 1 are scalars then the result can be simplified using 
Cauchy-Schwartz inequality to yield 

E ( S dimw]^) < 3 (xxl • 

\l,l'~13j ) ) \ 7 / V 7 / 

□ 

Lemma 7. Let X\, and X 2 be two subgaussian random variables, i.e., ||Xi||^, 2 < 00 ; and < 00 . 

Then the product X 1 X 2 is a sub exponential random variable with 

ll*i* 2 |k < c \\ xm \ x 2 u 2 . 


Proof. For a subgaussian random variable, the tail behaviour is 


P {\X\ > t} < eexp 


-ct z 


iaii 2 


Vi > 0; 


02 , 


see, for example, [46 . We are interested in 

P{|M 2 | >A}<P{|X!| >i} + P{|X 2 | >X/t} 

< e • exp (-ci 2 /||Xi||^, 2 ) + e • exp (-cA 2 /i 2 ||X 2 ||| 2 ) . 

Select i 2 = A||Xi||^ 2 /||X 2 ||^ 2 , which gives 

P{|X!X 2 | > A} < 2e-exp(- C A/||X 1 ||^||X 2 |i #2 ). 


47 


Now Lemma 2 . 2.1 in 
(1 + a)/fi. Using this result, we obtain 


imples that a random variable Z which obeys P {\Z\ > u] < ae /3n . Then HZH^ < 


||M 2 |k <c||VlWA 2 |k, 


which proves the result. 


□ 
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8 Proof of Theorem [2] 


In this section, we show that the nuclear norm penalized estimators give ideal performance for the stable 
recovery of Xq in the presence of additive measurement noise. We will consider the measurement model in 


(12), and the noise characteristics in (25). 


Proof. As will be clear later, the proof involves bounding the spectral norm: 

II®II = Mly) - EA*(y) < \\(A*A- X)(X 0 )|| + \\A *(£)|| 


(74) 


The bound on ||(4*4 — Z)(Xo)|| can be obtained directly using Corollary |3j and the quantity ||4*(£)|| * s 
controlled using Lemma [ 8 j Given these two results, the proof of Theorem [ 2 ] follows from the following main 
result in I HI. 


41 


). Let X be the solution of the ([24]) ; and Xq G matrix of rank 


Theorem 4 (Oracle inequlaity in 
R. If X> 2 1|01| ^ then 

\\X - X 0 \\ 2 f < min {2A||X 0 |U, 1.46A 2 #} . 

Corollary 3. Suppose Ll entries are observed using modulated multiplexing setup and let Z be a fixed W xM 
matrix with coherence fi 2 as defined (21). Then for all /3 > 0, 

KA*a -1) (Z )II < C\\Z\\ F] fSP ^/PiV y^log (WM) 
with probability at least 1 — provided > C/3 log 2 (WM). 

Proof. The proof of the Corollry follows from Lemma [3] by selecting the number of partitions k = 1. The 
result in the statement of the corollary is obtained from the bound in (61). In particular, the first term in 
the maximum in (61) dominates, when we select Q > C/3log 2 (WM). This proves the above corollary. □ 


Lemma 8. Let Aij be independent as defined (38) and pairs (Aij^yij) be independent. For /3 > 1 and 
Q > Cmm(W/M, 1) f3 log 2 (W M), the following 

M*(OII < cuu 2 \j log (WM), 

holds with probability at least 1 — ( WM)~P for a fixed constant C. 

Using Lemma [ 2 j and Lemma [ 8 ] we can bound ( [74] ) , and obtain 


A > 


C/3{||X 0 || 2 /i 2 max(IU/M,l) + ||||| 2 2 max(W/M, 1)} log(WM) 

n 


with probability at least 1 — 0(WM)~@ for a fixed constant C. Taking ||Xo||f = 1 without loss of generality, 
and Q > C/3/i 2 max(W/M, l)log 2 (VLM), which is in agreement with the assumptions on Q in Corollary [ 3 J 
and Lemma [ 8 ] we can control the right hand side. This proves Theorem [2] □ 

8.1 Proof of Lemma [HI 

The proof of this Lemma requires the use of matrix Bernstein’s inequality [ 2 ] As it is required to bound the 
spectral norm of the sum 4*(£) = JE j Aij , we start with the summands Zij = £ij A^. Because variables 
fij are zero mean, it follows that EZij = 0. The first quantity required is 


E EZ ^ 

= 

E E €ij ' E A ijA*j 

< max E A 2 - 

_ l<i<M J 

e E a «4 

hJ 


i,3 

1 <j<n 

hj 
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Using the definition of A{j in (38), we have 


E EZ ^ 

< max E C 2 - 

l<i<M J 

E E E d i [k}d i [k'}e i ete k ,e* 

hi 

l<j<n 

hi k,k'~Bj 




n M 



= max E £?• 

l<i<M J 

E E E«< 



1 <j<n 

j=1 k~Bj i=l 



<WmaxE4. = ffi^||£||2 2 , 


( 75 ) 


where the first equality follows from the independence of <^ 7 - and and the last equality follows from (25). 


The second quantity required to calculate the variance (50) is 


E EZ « Z * 

= 

E E ' E AC Aij 

hi 


hi 


< max E (?■ 


E EA h A « 




ikii 2 


1p2 


MCI 


ll£l| 2 


^2 


MCI 


E E E diikjdiik'je^eiel, 

i,j k,k'^Bj 
M n 

EEE ^e* k 

i=l j=1 k~Bj 


IICII 


■02 


ft 


Combining (75) and (76) and using (50) gives 


°Z = IICIU 2 


max(VE/M, 1) 

U ' 


The final quantity required is the Orlicz norm of the summand matrices Zij y i.e., 


f u\\l 2 w 


< c 


MCi n 


then 


l|Z «l», j s m hg ^ (WM) . 




(76) 


Hence, we obtain using P = OM, and t = f3\og(WM) in the Bernstein’s bound (51) 

< max |||£||^ 2 log 3 ' 2 (WM))\. 


Select U > C/3mm(W/M , 1) log 2 (lUM), then the first term dominates and the claim in Lemma [^follows. 
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